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Abstract 

In this paper we propose a wide class of truncated stochastic approxima- 
tion procedures. These procedures have three main characteristics: trunca- 
tions with random moving bounds, a matrix valued random step-size sequence, 
and a dynamically changing random regression function. We establish con- 
vergence and consider several examples to illustrate the results. 
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1 Introduction 

Stochastic approximation (SA) introduced by Robbins and Monro in 1951 ([20]) was 
created to locate a root of an unknown function when only noisy measurements of the 
function can be observed. SA quickly became very popular, resulting in interesting 
new developments and numerous applications across a wide range of disciplines. 
Comprehensive surveys of the SA technique including some recent developments 
can be found in [3], [j, [H], [15], [I6]. 

In this paper we propose a wide class of truncated SA procedures with moving 
random bounds. While we believe that the proposed class of procedures will find 
its way to a wider range of applications, the main motivation is to accommodate 
applications to parametric statistical estimation theory. Our class of SA procedures 
has three main characteristics: truncations with random moving bounds, a matrix- 
valued random step-size sequence, and a dynamically changing random regression 
function. 

To introduce the main idea, let us first consider the classical problem of finding 
a unique zero, say of a real valued function R{z) : M — )■ M when only noisy mea- 
surements of R are available. To estimate 2;°, consider a sequence defined recursively 
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as 

Zi= [Zi_i+7i(i?(Zi_i)+£i)]^;, t = l,2,... (1.1) 

where et is a sequence of zero-mean random variables and jt is a deterministic 
sequence of positive numbers. Here at and Pt are random variables with — oo < 
ttt < A < cxD and [v]l^ is the truncation operator, that is, 

a if V < a, 
\vt = <v if a<v <b, 
b if V > b. 

We assume that the truncation sequence [at, f3t] contains z° for large values of 
t. For example, if it is known that belongs to (a,/?), with — oo < a < /3 < oo, 
one can consider truncations with expanding bounds to avoid possible singularities 
at the endpoints of the interval. That is, we can take [at, Pt] with some sequences 
at I a and f3t t f^- Truncations with expanding bounds may also be useful to 
overcome standard restrictions on growth of the corresponding functions. 

The most interesting case arises when the truncation interval [at, Pt] represents 
our auxiliary knowledge about at step t, which is incorporated into the procedure 
through the truncation operator. Consider for example a parametric statistical 
model. Suppose that Xi, . . . ,Xt are independent and identically distributed random 
variables and f{x, 6) is the common probability density function (w.r.t. some cr-finite 
measure) depending on an unknown parameter 9 G M™. Consider the recursive 
estimation procedure for 9 defined by 

e, = h-.Ar(k-.rQ^4^, t>i. (1,2) 

J{Xt,Ot-i) 

where /' is the row-vector of partial derivatives of / w.r.t. the components of 9, i{9) 
is the one-step Fisher information matrix, and ^ is some initial value. This 
estimator was introduced in ^2] and studied in |10], [13] and [19]. In particular, it 
has been shown that under certain conditions the recursive estimator 9t is asymp- 
totically equivalent to the maximum likelihood estimator, i.e., it is consistent and 
asymptotically efficient. The analysis of (II. 2p can be conducted by rewriting it in 
the form of stochastic approximation. Indeed, in the case of (11. 2p . let us fix 9 and 
let 7t = 1/t, 

R{z)=^{zrE'lO^\ and St = ^{9t-^r ( ^^^^^^^ - R{9t^,] 



t-i. 



[E^ is expectation w.r.t. f{x,9)). Then, under the usual regularity assumptions, 
R{9) = and et is a martingale difference (w.r.t. the filtration J^t generated by the 
observations). So, (ll.2p is a standard SA of type (11. ip without truncations (i.e., in 
the one dimensional case, —at = Pt = oo). 
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However, the need of truncations may naturally arise from various reasons. One 
obvious consideration is that the functions in the procedure may only be defined 
for certain values of the parameter. In this case one would want the procedure 
to produce points only from this set. Truncations may also be useful when the 
standard assumptions such as restrictions on the growth rate of the relevant func- 
tions are not satisfied. More importantly, truncations may provide a simple tool 
to achieve an efficient use of information available in the estimation process. This 
information can be auxiliary information about the parameters, e.g. a set, possi- 
bly time dependent, that is known to contain the value of the unknown parameter. 
Suppose for instance that a consistent (i.e., convergent), but not necessarily effi- 
cient auxiliary estimator 9t is available having a rate dt. Then one can consider a 
truncated procedure with shrinking bounds. The idea is to obtain asymptotically 
efficient estimator by truncating the recursive procedure in a neighbourhood of 9 
with [at,/3j] = [6t — 5t,9t + 5t\, 6t — t- 0. Such a procedure is obviously consistent 
since 9t G [9t — 6t, 9t + 5t\ and 6't ± 5^ — t- 9. However, to construct an efficient esti- 
mator, care should be taken to ensure that the truncation intervals do not shrink to 
9t too rapidly, for otherwise 9t will have the same asymptotic properties as 9t (see 
[28] for details in the case of AR processes). Since this paper is concerned with the 
convergence, details of this application is not discussed here. However, since the 
procedures with shrinking bounds are particular cases of the general SA procedure 
below (see (12. ip ). asymptotic distribution and efficiency can be studied in an unified 
manner using ideas of SA. 

Note that the idea of truncations with moving bounds is not new. For example, 
an idea of truncations with shrinking bounds goes back to [lOj and [13j. Truncations 
with expanding bounds were considered in [1] and also, in the context of recursive 
parametric estimation, in [23] (see also [2B]). Truncations with adaptive truncation 
sets of the Robbins- Monro SA were introduced by Chen and Zhu in and further 
explored and extended in [6], [2], [30], [31], [17]. The latter algorithms are designed 
in such a way, that the procedure is pulled back to a certain pre-specified point or a 
set, every time the sequence leaves the truncation region. As one can see from (11. ip 
and (12. ip . truncation procedures considered in this paper are quite different from 
the ones by Chen and Zhu and are similar to the the ones introduced by Andradottir 
in [1] (see Rematk 12.90 . A detailed comparison of these two different approaches is 
given in [1]. 

Let us now consider a discrete time stochastic processes Xi, X2, . . . with the joint 
distribution depending on an unknown parameter 9 G M"^. Then one can consider 
the recursive estimator of 9 defined by 

^t = ^t-i+7t(^t-i)^t(^t-i), t>l, (1.3) 

where iptiv) = ipti^i, ■ ■ ■ , v), t = 1, 2, ... , are suitably chosen functions which 
may, in general, depend on the vector of all past and present random variables 
and have the property that the process ipt{9) is P^- martingale difference, i.e.. 
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{ipt{(^) I J^t-i} = for each t. For example, if ft{x, 6) = ft{x, 9\Xi, . . . , Xj_i) is 
the conditional probability density function of the observation Xf given Xi, . . . , Xf-i, 
then one can obtain a likelihood type estimation procedure by choosing iptiv) = 
hiy) = fl.{Xt,v)/ ft{Xt,v). Asymptotic behaviour of this type of procedures for non 
i.i.d. models was studied by a number of authors, see e.g., [7], [9], [H], [21] - 
[27] . Results in [27j show that to obtain an estimator with asymptotically optimal 
properties, one has to consider a state-dependent matrix-valued random step-size 
sequence. One possible choice is ■yt{u) with the property 

%\v) - %-\iv) = Eemv)lJiv) I Tt-i} 

In particular, to obtain a recursive procedure which is asymptotically equivalent to 
the maximum likelihood estimator, one has to consider lt{v) = f^{Xt,v)/ ft{Xt,v) 
and 7t(u) = I^^(v), where It{v) is the conditional Fisher information matrix (see 
[27] for details). To rewrite (II. 3p in a SA form, let us assume that 6 is an arbitrary 
but fixed value of the parameter and define 

Rt{z)=E'{MXt,z) I and et{z) = (M^t, z) - Rt{z)) . 

Obviously, Rt{0) = for each t, and et{z) is a martingale difference. 

Therefore, to be able to study these procedures in an unified manner, one needs 
to consider a SA of the following form 

Zt = [ Zt-i+it{Zt-i){Rt{Zt-i)+et{Zt^i)]]^^, t = 1,2,. . . 

where Rt{z) is predictable with the property that Rt{z^) = for all fs, 'yt{z) is 
a matrix-valued predictable step-size sequence, Ut C M"^ is a random sequence of 
truncation sets, and Zq G M"* is some starting value (see Section 2 for more details). 

To summarise the above, the procedures introduced in this paper have the follow- 
ing features: (1) inhomogeneous random functions Rf, (2) state dependent matrix 
valued random step sizes; (3) truncations with random and moving (shrinking or 
expanding) bounds. These are mainly motivated by parametric statistical appli- 
cations. In particular, (1) is required to include recursive parameter estimation 
procedures for non i.i.d. models, (2) is needed to guarantee asymptotic optimal- 
ity and efficiency of statistical estimation, (3) is required to accommodate various 
different adaptive truncations, including the ones arising by auxiliary estimators. 
Also, the convergence of these procedures is studied under very general conditions 
and the results might be of interest even for the procedures without truncations 
(i.e., when Ut = M"*) and with a deterministic and homogeneous regression function 
Rt{z) = R{z). 

The paper is organised as follows. In sections 2.2 we prove two lemmas on the 
convergence. The analysis is based on the method of using convergence sets of 
nonnegative semimartingales. The decomposition into negative and positive parts 
in these lemmas turns out to be very useful in applications (see Example 3 in Section 



4 



2.4). In section 2.3 we give several corollaries in the case of state independent scalar 
random step-size sequences. In section 2.4 we consider examples. Proofs of some 
technical parts are postponed to Section 3. 



2 Convergence 

2.1 Main objects and notation 

Let {fl, T,F = {J-'t)t>o, P) be a stochastic basis satisfying the usual conditions. 
Suppose that for each i = 1, 2, . . . , we have {BiW^) x J^)-measurable functions 



Rt{z) = Rt{z,uj) 
et{z) = et{z,u) 



such that for each z G M™, the processes Rt{z) and 7t(-2) are predictable, i.e., 
Rt{z) and jtiz) are J^t-i measurable for each t. Suppose also that for each z e 
R"*, the process et{z) is a martingale difference, i.e., £t{z) is J^t measurable and 
E{et{z) I Tt-i} — 0. We also assume that 

Rt{z') = 

for each t — 1,. . . , where z° e R"* is a non-random vector. 

Suppose that h — h{z) is a real valued function of z e R"*. We denote by h'{z) 
the row- vector of partial derivatives of h with respect to the components of z, that 
is, 



h'{z)^(^-^Hz),...,^hiz] 



dz.. 

Also, we denote by h"{z) the matrix of second partial derivatives. The mxm 
identity matrix is denoted by 1. 

Let U C R™ is a closed convex set and define a truncation operator as a function 
[z] ^-.W^ — > W^, such that 

where z* is a point in U, that minimizes the distance to z. 

Suppose that z° G R™. We say that a random sequence of sets Ut — Ut{uj) 
{t = 1,2,...) from R"^ is admissible for z° if 

• for each t and Ut{uj) is a closed convex subset of R"*; 

• for each t and z G R'", the truncation [z] is Tt measurable; 
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• E Ut eventually, i.e., for almost all u there exist to^u) < oo such that z° G 
Ut{ijj) whenever t > tQ^u). 

Assume that Zq G R"^ is some starting value and consider the procedure 

Zt= [Zt.,+^t{Zt-i)^t{Zt-i)]^^, t=l,2,... (2.1) 

were "^tiz) = Rt{z) +et{z), Ut is admissible for z°, Rt{z), et{z), 'jt{z) are random 
fields defined above, 

£;{^t(Zt_i) I J-t_i} = i?t(Zt_i), (2.2) 
E{eJ{Z,_,)et{Z,_,) \ = [E {sjiz)etiz) \ Tt-x}\,^^^_^. (2.3) 

and the conditional expectations (12. 2p and (12. 3p are assumed to be finite. 

Remark 2.1 l^ote that (12.21) in fact means that the sequence et{Zt_i) is a martin- 
gale difference. Conditions (12.21) and (12.31) obviously hold if, e.g., the measurement 
errors et{u) are independent random variables, or if they are state independent. In 
general, since we assume that all conditional expectations are calculated as integrals 
w.r.t. corresponding regular conditional probability measures (see the convention 
below), these conditions can be checked using disintegration formula (see, e.g.. The- 
orem 5.4 in [W]). 

Convention. 

• Everywhere in the present work convergence and all relations between random vari- 
ables are meant with probability one w.r.t. the measure P unless specified otherwise. 

• A sequence of random variables (Ct)t>i ^'^■^ some property eventually if for every 
CO in a set Qq of P probability 1, the realisation Cti!^) has this property for all t 
greater than some to(i^) < 00. 

• We assume that all conditional expectations are calculated as integrals w.r.t. cor- 
responding regular conditional probability measures. 

• We will also assume that the inf^g^/ h{z) of a real valued function h{z) is 1 when- 
ever t/ = 0. 

2.2 Convergence Lemmas 

Lemma 2.2 Let Zt be a process defined by (12.11) . (12. 2p and (12. 3p . with an admissible 
for z^ G M™' truncation sequence Ut. Let V{u) : — > be a real valued nonneg- 
ative function having continuous and bounded partial second derivatives. Denote 

At = Zt- z° 

and suppose that the following conditions are satisfied. 
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(L) 

eventually. 

(S) 

+ V{At-i))-' [M(Ai_i)]+ < oo, P-a.s. (2.4) 

t=i 

where 

Mt{u) = V\u)^t{z'' + u)Rt{z'' + u) 

+^sup \\V"{v)\\E {||7t(^° + u)^t{z° + u)r I . 

Then V{Zt — 2;°) converges (P-a.s.) to a finite limit for any initial value Zq. Fur- 
thermore, 

00 

J2 [M(Ai-i)]- < 00, P-a.s. (2.5) 

t=i 

Proof. As always (see the convention in 12. ip . convergence and all relations 
between random variables are meant with probability one w.r.t. the measure P 
unless specified otherwise. 

From condition (L), using the Taylor expansion, 

V{At) < V{At-i) + V'iAt^Mz" + At^,)^t{z" + Ai_i) 
+ ^ [-ft{z° + At_i)^i(z° + At^,)f V'CAt-Mz" + At-i)^tiz" + At-i), 

where At-i G M"* is J-t_i-measurable. Using (12. 2 p and (I2.3p and taking the condi- 
tional expectation w.r.t. J^t-i yields 

E {V{At) I < V{At^,) + M(Ai_i). (2.6) 

Using the obvious decomposition A/j(At_i) = [Aft{At-i)]^ — [A/t(Aj_i)]^, we can 
write 

M(Ai_i) = (1+ F(Ai_i))-^ m^t-iT (1 + v{At-i)) - mAt-iT 

= Bt{l + V{At-i))-[^t{At-i)]-. 

where 

Bt = {l + ViAt-i))-'mAt-iT. 

Hence (12. 6p implies that 

E {ViAt) I J't-i} < V(At_i)(l + B,) +Bt- [Mt{At.i)]-, (2.7) 
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eventually and, by f l2.4p . 

oo 

J2Bt<oo. (2.8) 

t=i 

According to the Robbins-Siegmund Lemma (see e.g., [21]) inequalities (12. 7p and 
(12. 8p imply that (12. 5p holds and V{At) converges to some finite limit. (} 

Everywhere below, we assume that the inf„g[/f (m) of a function v{u) is 1 when- 
ever ?7 = 0. 

Lemma 2.3 Suppose that V{Zt — z°) converges (P-a.s.) to a finite limit for any 
initial value Zq, where Zt and V are defined in Lemma l27^ and (12. 5p holds. Suppose 
also that for each e G (0, 1), 

inf V{u) > 5 > (2.9) 

\M\>E 

z°+u&Ut 

eventually, for some 5. Suppose also that 
(C) For cache G (0,1), 

oo 

Einf[A/'t(n)]" = oo, P-a.s. 
u 

t=l 

where the infimum is taken over the set {u : e < V{u) < l/e; + u E Ut-i}. 

Then Zt — )■ z° (P-a.s.), for any initial value Zq. 

Proof. As always (see the convention in 12. ip . convergence and all relations 
between random variables are meant with probability one w.r.t. the measure P 
unless specified otherwise. Suppose that V{At) r > and there exists a set A with 
P{A) > 0, such that r > on A. Then there exists e > and (possibly random) to, 
such that ift>to,e< V{At-i) < l/e on A. Note also that z°+At-i = ^t-i e Ut-i. 
By (C), these would imply that 

oo oo 

V[Ar,(A,_i)]- > Vinf [A/;(m)]- = oo 

S=to S=to 

on the set A, where the infimums are taken over the sets specified in condition (C). 
This contradicts (12. 5p . Hence, r = and so, V{At) — )■ 0. Now, At — )■ follows 
from (12. 9 P by contradiction. Indeed, suppose that At on a set, say B of positive 
probability. Then, for any fixed co from this set, there would exist a sequence — ?■ oo 
such that ||Atj.|| > e for some e > 0, and (12. 9 p would imply that V{At^) > 5 > for 
large k-s, which contradicts the P-a.s. convergence V{At) — 0. <) 
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2.3 Sufficient conditions 

Everywhere in this subsection we assume that 7^ is state independent (i.e., constant 
w.r.t. z) non-negative scalar predictable process. 

Corollary 2.4 Let Zf be a process defined by f l2.ip . (12. 2p and (12. 3p . with an ad- 
missible for G M*" truncation sequence Ut- Suppose also that 7f is a non-negative 
predictable scalar process and 



(CI) 



|2(z - zT^,(z) +7,i? {||^t(^)in 
eventually, where 



^gt7i<oo, P-a.s. 



t=i 



Then \\Zt — z^\\ converges (P-a.s.) to a finite limit. 

Proof. Let us show that the conditions of Lemma [2^2] are satisfied with V{u) = 
u^u = llnp and the step-size sequence 74(2) = 7tl. Since 2;° G Ut for large t-s, the 
definition of the truncation (see 12. ip implies that 

\\Zt-z'\\ < ||Zi_i + 7i^,(Zi_i)-zO||, 

eventually. Therefore (L) holds. Then, V'{u) = 2u^ and V"{u) = 21, and so, for 
the process Ntiu) in (12. 4p we have 

Mt{u) = 2u^-1tRt{z° + u)+-ilE{\\<^t{z'' + u)\\'' I Ft-i] (2.11) 

and 

[A/;(Ai-i)]+ _ [2A?:ii?i(z° + At-i)+7*i?{||^i(^° + Ai-i)|n + 



l + V(At_i) ^* 1+||A 



t-l\ 



12 



Since z° + At_i = Zt-i G f/t_i, (El follows from (CI). ^ 

Corollary 2.5 Suppose that the conditions of Corollary \2.4\ hold and 
(C2) for each EE (0,1), 

Einf [^(m)]" = 00, P-a.s. 
u 

t=l 

where 

Mt{u) = 2u^jtRt{z" + u)+ j^E{\\<i/t{z° + u)f I J't-i} 
and the infimum is taken over the set {u : e < \\u\\ < l/e; z'^ -\- u E Ut-i}. 
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Then Zt — )■ z° (P-a.s.), for any initial value Zq. 

Proof. Let us show that the conditions of Lemma 12.31 are satisfied with V{u) = 
u^u = and jtiz) = 7tl. It follows from the proof of Corollary 12.41 that all the 
conditions of Lemma [2.21 hold with V{u) = u^u. Hence, \\Zt — z'^\\ converges and 
f l23|) holds. Since 

inf > 

li"ll>^ 

condition (12. 9 p also trivially holds. Finally, (C) is a consequence of (C2). <) 

Corollary 2.6 Suppose that Zt is a process defined by f l2.ip . f l2.2p and fl2.3p . with 
an admissible for G truncation sequence Ut and 



(1) 



(2) 



{z - z°fRt{z) < for any z e Ut, 

eventually; 



|2 



eventually, where 



(3) 



sup — < Tt 

zeUt^i -r \\z — z 



^rt7f<oo, P-a.s., 
t=i 



E {\\et{z)r \ J^t-i} ^ 
zeUt-i T 11^ ^11 



eventually, where 



^^et7j^<oo, P-a.s.. 

Then \\Zt — z^\\ converges (P-a.s.) to a finite limit. 
Proof. Using condition (1), 

[2{z-z"fRt{z)+^tE{\\%{z)\\'\Tt-i}r<ltE{\\^t{zW\J't-i} 

eventually. Since E {et{z) \ J-'t-i} = and Rt{z) is J-'t_i-measurable, we have 

E{\mz)r I :Ft-i} = \\Rt{z)r + E {\\et{z)r I Tt-,}. (2.12) 

So, by conditions (2) and (3), the left hand side of (12.1 OP does not exceed (^4 + 64)74. 
Hence conditions of Corollary 12.41 hold with qt = {rt + et)'yt and the result follows. 





10 



Corollary 2.7 Suppose that the conditions of Corollary \2.b\ are satisfied and 
(CC) for eachee (0,1), 



inf -{z - z^)"^ Rt{z) > n (2.13) 

e<||z-z°||<l/e 



eventually, where 



^z/i7i = cx), P-a.s. 



t=i 



Then Zt converges (P-a.s.) to z^ . 

Proof. It follows from the poof of Corollary 12.61 that conditions of Corollary 12.41 
hold. Let us prove that (C2) of Corollary 12.51 holds. Using the obvious inequahty 
[a]^ > —a, we have 

Wt{u)]- > -2M%i?(;2° + u)- j^E { \\^t{z" + u)f\ J^t-i] ■ 

Using f l2.12p and conditions (2) and (3) of Corollary l2.6l and taking the supremum of 
the conditional expectation above over the set {u : e < \\u\\ < l/e; + u & Ut^i}, 
we obtain 



1 + ' 



Then, by (12.131) . taking the infimum over the same set, 

inf [Mtiu)]- > 2^ti^t - ^^{rt + et)il + \\l/ef). 

Condition (C2) is now immediate from (CC) and conditions (2) and (3) of Corollary 
12.61 Hence, by Corollary 12. 5[ Zf converges (P-a.s.) to z^. (} 

Remark 2.8 Suppose that St is an error term which does not depend on z and 
denote 

a^, = E [WstW' \ J't-i] 

Then condition (3) holds if 

oo 

J]ah'<oo, P-a.s.. (2.14) 

t=i 

This shows that the requirement on the error terms are quite weak. In particular, 
the conditional variances do not have to he hounded w.r.t. t. 
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Remark 2.9 As it was mentioned in the introduction, our procedure is similar to 
the one considered in fJ]/. Let us compare these two in the cases when the com- 
parisons are possible. Hence, consider truncations on increasing non-random sets, 
non-random and homogeneous Rt{u) = R{u), and scalar and state-independent 'jt in 
Corollaries \2. 6\ and 2/7_. Also, in Theorem 2 of /i/ take /3„ = 0. Then the resulting 



two sets of conditions are in fact equivalent. In particular, in terms of notation in 

1 _ 2 _ 

Now it is clear that conditions 2. and 3. in Theorem 2 of [jj are equivalent to (3) 
and (2) respectively in Corollary \2.6l Note that although condition (CC) in \2. 7| is 



formally more general than condition 2. in Theorem 2 of JT^, in any meaningful 
applications they are equivalent. 

2.4 Examples 

Example 1 Let / be an odd integer and 

R[z) = -{z-z^^y, 

z^ G M. Consider a truncation sequence [— Oj, at], where — )■ oo is a sequence of 
positive numbers. Suppose that 

oo oo 

^ 7t = cx) and ^ af_^ -f^ < oo. 
t=i t=i 

Then, provided that the measurement errors satisfy (I2.14p (or condition (3) of Corol- 
lary |22] in the case of state-dependent errors), the truncated procedure 

Zt= [Zt^, + -ft{RiZt^,)+et)]Z,, t = l,2,... 

converges a.s. to z'^. 

Indeed, condition (1) of Corollary 12.61 trivially holds. For large t's, 

sup -M^f-^< sup {z-zY<4'al, 

which implies condition (2) of Corollary 12.61 Condition (CC) of Corollary 12.71 also 
trivially holds with z/j = e^"*"^. 

For example, if the degree of the polynomial is known to be / (or at most /), and 
7t = 1/t, then one can take at = Ct~i~^ ^ where C and 5 are some positive constants 
and 5 < One can also take a truncation sequence which is independent of /, e.g., 
at = Clogt, where C is a positive constant. 
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Example 2 Let Xi, X2, . . . , be i.i.d. Gamma(6', 1) 
common probability density function is 



6 > 0. Then the the 



fix, 9) 



m 



> 0, X > 0, 



where T{9) is the Gamma function. Then 

^-iogx-^^iogr(^, 

iog'r(0) 



iog"r(e) 



where i{6) is the one-step Fisher information. Then a likelihood type recursive 
estimation procedure (see also (II. 2p ) can be defined as 



et-i + 



t iog"r(0i_i) 



iogXi-iog'r(^i_i; 



t = l,2,... 



(2.15) 



J at 



where at iO and /3t t 00 are sequences of positive numbers. 

Everywhere in this example, J^t is the sigma algebra generated hj Xi, . . . , Xt, 
is the family of corresponding measures, and > is an arbitrary but fixed value 
of the parameter. 

Let us rewrite (12.151) in the form of the stochastic approximation, i.e., 



Ot 



t = l,2. 



(2.16) 



where (see Section [3] for details) 

1 



R{u) = R%u) 



and 



iog"r(u; 



£t[u) 



E\\nXt-\og!T{u)} 



iog"r(u; 



(log'r(^)-iog'r(w)) 



{\ogXt-\o^T{u))-R{u). 



log"r(n) 

Since \\ogXt \ J^t-i} = -^^{logXt} = log'r(6') and 9t-i is J^t-i - measurable, 

we have \et{9t-i) \ J^t-i} = and hence holds. Since E^ {log^Xj < 00, 

condition (12. 3 p can be checked in the similar way. Obviously, R{9) = 0, and since 
log'r is increasing (see, e.g., [32], 12.16), condition (1) of Corollary 12.61 holds with 
= 9. Based on the well known properties of the logarithmic derivatives of the 
gamma function, it is not difficult to show (see Section [3]) that if 

^ log^ at-i + log^ (3t-i 
t=i 
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00 o 



t=i 



t-i 
t 



00 and 



t2 



< 00, 



(2.17) 



then all the conditions of Corollary 12.61 and 12.71 hold and therefore, 6t is consistent, 
i.e., 

9t^9 as t^oo (P^-a.s.). 

For instance, the sequences 

= Ci(log(t + 2))-5 and A = C2(t + 2) 

with some positive constants Ci and C2, obviously satisfy f l2.17p . 

Note also, that since 6 G (0, 00), it may seem unnecessary to use the upper 
truncations [3t < 00. However, without upper truncations (i.e. if /3t = 00), the 
standard restriction on the growth does not hold. Also, with (3t = 00 the procedure 
fails condition (2) of Corollary 12.61 (see (13. 7p ). 

Example 3 Consider an AR(1) process 

Xi = eX,„i+6, (2.18) 
where is a sequence of random variables with mean zero. Taking 

^!t{z) = Xt-i {Xt - zXt.^) 

"jtiz) = 7t = If = Jo + X]s=i^t^~i) = IR, procedure (12. ip reduces to the 

recursive least squares (LS) estimator of 9, i.e., 

Ot = Ot^i + /r'Xt_i (Xi - Ot-iXt^i^ , (2.19) 

it = it-i + xl,, t = i,2,... 

where ^0 and Iq > are any starting points. 

For simplicity let us assume that is a sequence of i.i.d. r.v.'s with mean zero 
and variance 1. Consistency of f l2.19p can be derived from our results for any ^ G M 
and without any further moment assumptions on the innovation process Indeed, 
assume that 9 is an arbitrary but fixed value of the parameter. Then, using (I2.18p . 
we obtain 

Xt — 9t_iXt-i = 6 + Xt-i{0 — 9t_i). 
and f l2.19p can be rewritten as 

Ot = Ot-i + /r' [Xl,{e - Ot-i) + Xt_i6) . (2.20) 
So, (I2.20p is a SA procedure with 

Rt{z)=XU{e-z), (2.21) 

St{z) = St = Xt-i^t, It = h and Ut = IR. Let us check condition (CI) of Corrolary 
12.41 with z'^ = 9 and Ut = M. Since E {et \ J^t-i} = and Rt{z) is J^t-i measurable, 
dM]) and trivially hold. Also, 

^ {ii^*(^)ir I j't-i} = \mz)r + E{\\etr i -^^-i} = xue - zf + xu, (2.22) 
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denoting the expression in the square brackets in fl2.10p by Wt{z) (with z° = 6), we 
obtain 

wtiz) = -2xl,{z - ef + i;'xU{e - zf + i;'xl, (2.23) 
= -6xU{z - ef - xU{z - ef ((2 - 5) - i-'xu) + i-'xl, (2.24) 

for some < 5 < 1. Since I^^X'f_^ < 1, the positive part of the above expression 
does not exceed if^Xf_i. This imphes that f l2.10p holds with qt = i^^X^_^. Now, 
note that if dn is a nondecreasing sequence of positive numbers such that c/^ — )■ +oo 
and Adt = dt — dt-i, then J2t^i ^^t/dt = +oo and Xlt^i ^dt/d^ < +oo. So, for 
X'^_^ = Alt, since It ^ oo for any 6* G M (see, e.g, Shiryayev ^29j, Ch.VII, §5) , we 
have 

oo oo 

Y,It-'Xl,<oc and Y.^t''Xl, = oc. (2.25) 

t=i t=i 

Hence, taking qt'jt = it'^X'f_i, (CI) follows. Therefore, (^^ — ef converges to a finite 
limit. To show convergence to 6', let us check condition (C2) of of Corrolary l2.5l with 
z^ = e and Ut = M. Using (jMI]) and fl2:22|) . we have 

Aftiu) = -2i-'xl,u' + ii'xiy + i-'xl, = ii'wtie + n), 

where Wt is defined in (12.231) . Since the middle term in (12.241) is non-positive, using 
the obvious inequality [a]^ > —a, we can write 

[Xt{u)]->6i-'xiy-i-'xi,, 

and 

oo 

Einf \Aft(u)r = oo 

t=l ' 

now follows from ([225]). So, by Corollary [231 Ot ^ e (P^- a.s.). 

Note that the convergence of the LS estimator is well known under these as- 
sumptions, (see e.g., [29], Ch.VII, §5). This example is presented to demonstrate 
that the assumptions made here are minimal. That is, in well know model cases, 
the results of the paper do not assume any additional restrictions. 

3 Appendix 

We will need the following properties of the Gamma function (see, e.g., [32j, 12.16). 
log'r is increasing, log"r is decreasing and continuous, and 

oo 

iog"r(x) = ^ + ^. 

n=l ^ ' 
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The latter implies that 



1 "'^ rn 

n=l "^"-1 



dz 1 1 1 + X ,^ ^ , 

+ - = —^ (3-1) 



'x + 2;)^ x"^ X X 



and 

Also (see 0, 12.5.4), 

logT(x) < ln(x). (3.3) 

Then, 

{logXi} = log'r(^^) and E' {(logXi)'} = log"r(0) + (log'r(^^))' (3.4) 

and 

E' {(logXi - log'r(^))'} = iog"r(0). 

Let us show that the conditions of Corollary 12.61 hold. Since 

^t{u) = } (logX, - log'r(n)) , 
log T[u) 

using fl3.4p and fl3.2p we obtain 

i + ||m-^||2 (iog"r(n))2(i + ||u-^^P) 

< ^ ^ _ (iog"r(^) + (iog'r(^^) - log'r(n))^) . 

Now, I {1 + {u — OY) < C. Here and further on in this subsection, C denotes 
various constants which may depend on 6. So, using ( 13. 3 p we obtain 

^^r+|h!'-lp''^ - ^ + + log'r(n)^) < C{1 + log^(n)). 

For large t's, since at <1 < (3t, we have 

sup log^(M) < < sup log^('u)+ sup log^('u) > < log^Oi + log^/Sj. 

ue[at,[St] {at<u<l l<«</3t J 

Condition (2) of Corollary 12.61 is now immediate from the second part of fl2.17p . It 
remains to check that (CC) of Corollary 12.71 holds. Indeed, 

. ^ (^-g) (log-r(tx)-iog-r(g)) 

-{u - 9)R{u) = . 

log T{u) 
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Since log'r is increasing and log"r is decreasing and continuous, we have that for 
each e G (0, 1), 

• f I a.u^ mie<\\u^e\\<i/e (log r(n) - log^ T{e)) {u - 9) ^ C 

mf -[u-e)R{u) > „ > —^7— r 

.<iiu-eii<i/. log log T{at-i) 

u£LJt-i 

(3.6) 

where C is a constant that my depend on e and 6. Since at^i < 1 for large t's, it 
follows dSl]) that l/log"r(at_i) > a?_i/2. Condition (CC) of Corollary O is now 
immediate from the first part of fl2.17p . 

Note that with f3t = oo the procedure fails condition (2) of Corollary 12.61 Indeed, 
( 13. 5 p and (13. ip implies that 



EWj(u)\J^, a {\og"T{e) + {\og'T{e)-\og'T{u)f]u' 

sup ; 7 > sup -; -TT- 7 -—r: = OO (3.7) 

l + {u-ey (l+M)2(l + (u-e)2) ^ ^ 
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